Skip to content

fix: adapt SDK storage layer to crawlee v4 StorageClient interface#595

Open
B4nan wants to merge 18 commits into
fix/event-manager-v4-adaptfrom
fix/storage-client-v4-adapt
Open

fix: adapt SDK storage layer to crawlee v4 StorageClient interface#595
B4nan wants to merge 18 commits into
fix/event-manager-v4-adaptfrom
fix/storage-client-v4-adapt

Conversation

@B4nan
Copy link
Copy Markdown
Member

@B4nan B4nan commented Apr 30, 2026

Summary

Crawlee v4 reshaped the StorageClient interface, removed the cached storageObject from KeyValueStore, and made getPublicUrl async. The SDK still targeted the v3 shape and no longer compiled. This PR adapts:

  • New ApifyStorageClient adapter wrapping apify-client's legacy dataset()/keyValueStore()/requestQueue() accessors and exposing the factory methods (createDatasetClient / createKeyValueStoreClient / createRequestQueueClient) crawlee now requires. Name → ID resolution goes through each collection's getOrCreate(name).
  • Actor.init and _openStorage now wrap this.apifyClient in the adapter before handing it to crawlee/StorageManager.
  • KeyValueStore.getPublicUrl is now async; the per-store urlSigningSecretKey is fetched on demand via the (private) client.getMetadata() rather than the removed storageObject cache. URL-signing behaviour in platform mode is preserved.
  • Actor.openRequestQueue reads totalRequestCount via client.getMetadata() (the old client.get() is gone).
  • StorageManager.openStorage signature is now (class, id?, client?) — dropped the trailing this.config argument.

Known gaps tracked in this PR

apify-client's DatasetClient / KeyValueStoreClient / RequestQueueClient don't yet implement v4-added members like getMetadata and getRecordPublicUrl. The adapter casts through with as unknown as ... for now — the proper fix is to bring apify-client's resource client shapes into structural alignment with @crawlee/types, which is out of scope here.

The KeyValueStoreInfo type in @crawlee/types also doesn't declare urlSigningSecretKey, so the SDK uses a local intersection type to access it. Worth surfacing upstream eventually.

Stacking

Depends on #583 (config redesign). Rebases cleanly onto v4 once that lands.

@B4nan B4nan force-pushed the fix/storage-client-v4-adapt branch from fba9532 to 777f5d6 Compare April 30, 2026 19:14
@B4nan B4nan changed the base branch from v4 to fix/event-manager-v4-adapt May 6, 2026 09:26
Copy link
Copy Markdown
Member

@barjin barjin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few ideas, but lgtm overall. Thanks!

@janbuchar might have more ideas as the original author

Comment on lines +28 to +32
const id =
options?.id ??
(options?.name
? (await this.client.datasets().getOrCreate(options.name)).id
: undefined);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that if we implement storageExists, Crawlee should be able to resolve the identifiers on its own(?)

* implementation (which produces a `file://` URL or returns `undefined`).
*/
override getPublicUrl(key: string): string {
override async getPublicUrl(key: string): Promise<string | undefined> {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this change closes this issue - I'm not sure if the renaming was a requirement, or a way to make this BC.

Comment on lines +40 to +42
// `client` is `private` on `CoreKeyValueStore`; bypass the visibility
// check to fetch the per-store secret. There is no public crawlee API
// surface for this yet — track upstream exposure as a follow-up.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, please create the issue in Crawlee (I see that Dataset and RequestProvider already both have client public, so it's weirdly asymmetrical now).

@janbuchar
Copy link
Copy Markdown
Contributor

@B4nan did you check parity with apify-sdk-python? I assume that there will be gaps in the request queue implementation, which is fine at this point.

@B4nan B4nan force-pushed the fix/event-manager-v4-adapt branch from f5d0e03 to 94e8aa4 Compare May 11, 2026 16:42
@B4nan B4nan force-pushed the fix/storage-client-v4-adapt branch 2 times, most recently from f271efa to 83afcc1 Compare May 11, 2026 17:06
@B4nan B4nan force-pushed the fix/event-manager-v4-adapt branch from 94e8aa4 to c78bc3d Compare May 11, 2026 17:06
@B4nan B4nan force-pushed the fix/storage-client-v4-adapt branch from 83afcc1 to 5bd8f93 Compare May 11, 2026 17:28
@B4nan B4nan force-pushed the fix/event-manager-v4-adapt branch 2 times, most recently from f8aa684 to 076c1eb Compare May 11, 2026 17:35
@B4nan B4nan force-pushed the fix/storage-client-v4-adapt branch from 5bd8f93 to e56ccdf Compare May 11, 2026 17:35
@B4nan B4nan force-pushed the fix/event-manager-v4-adapt branch from 076c1eb to cc2c4dd Compare May 12, 2026 11:24
@B4nan B4nan force-pushed the fix/storage-client-v4-adapt branch 2 times, most recently from 10b52d7 to cf11503 Compare May 12, 2026 11:41
@B4nan B4nan force-pushed the fix/event-manager-v4-adapt branch 2 times, most recently from 89d0f5c to b4ea1e5 Compare May 12, 2026 11:56
@B4nan B4nan force-pushed the fix/storage-client-v4-adapt branch from cf11503 to ef84e4e Compare May 12, 2026 11:56
@B4nan B4nan force-pushed the fix/event-manager-v4-adapt branch from b4ea1e5 to fd14806 Compare May 12, 2026 12:18
@B4nan B4nan force-pushed the fix/storage-client-v4-adapt branch from ef84e4e to 49e27d2 Compare May 12, 2026 12:18
B4nan added a commit that referenced this pull request May 12, 2026
@B4nan B4nan force-pushed the fix/event-manager-v4-adapt branch from fd14806 to c1f975a Compare May 12, 2026 12:30
@B4nan B4nan force-pushed the fix/storage-client-v4-adapt branch from 49e27d2 to 036a6a0 Compare May 12, 2026 12:30
B4nan added a commit that referenced this pull request May 12, 2026
@B4nan B4nan force-pushed the fix/event-manager-v4-adapt branch from c1f975a to 79c7560 Compare May 12, 2026 14:31
@B4nan B4nan force-pushed the fix/storage-client-v4-adapt branch 2 times, most recently from 9067d2f to 2083fb2 Compare May 12, 2026 14:32
B4nan added a commit that referenced this pull request May 12, 2026
@B4nan B4nan force-pushed the fix/event-manager-v4-adapt branch from dec3f6c to 43cf6a4 Compare May 12, 2026 14:37
@B4nan B4nan force-pushed the fix/storage-client-v4-adapt branch from 2083fb2 to 4001867 Compare May 12, 2026 14:37
B4nan added a commit that referenced this pull request May 12, 2026
B4nan and others added 17 commits May 12, 2026 16:58
…uration

crawlee 4.0.0-beta.56 ships `Configuration.reset()` (apify/crawlee#3649),
so the SDK's override can delegate to `super.reset()` instead of calling
`serviceLocator.reset()` directly. The SDK still owns clearing its own
`globalConfig` static and replacing the `AsyncLocalStorage` singleton.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Crawlee v4's `EventManager` constructor now requires
`EventManagerOptions` (just `persistStateIntervalMillis`), and the
base class no longer carries a `config` field — the previous
`override readonly config` pattern is no longer valid.

- Drop the `override` and store `config` as own readonly property.
- Forward `persistStateIntervalMillis` to `super()`.
- Add a `fromConfig()` factory mirroring `LocalEventManager.fromConfig()`
  so the SDK plays nicely with the new ServiceLocator-driven init path.

Stacked on #583 (config redesign); rebases onto v4 once that lands.
`Configuration.useEventManager()` was removed in crawlee v4. Install
the platform event manager via the global service locator instead, and
reset between tests so each case can register a fresh manager without
hitting `ServiceConflictError`.
Crawlee v4's `Configuration` is eager — `actorEventsWsUrl` is read
once at construction, so a global config that pre-existed the
`beforeEach` would never see the websocket URL we set, and
`events.init()` would silently never connect. Move the env-var setup
above `Configuration.getGlobalConfig()` and reset the SDK's static
singleton so each test rebuilds a fresh config.
Crawlee v4 reshaped its `StorageClient` interface (async factory
methods that accept `id` *or* `name`), removed the cached
`storageObject` from `KeyValueStore`, and made `getPublicUrl` async.
The existing SDK code targeted the v3 shape and no longer compiles.

Changes:
- New `ApifyStorageClient` adapter wraps `apify-client`'s legacy
  `dataset()/keyValueStore()/requestQueue()` accessors and exposes
  the `createDatasetClient/createKeyValueStoreClient/createRequestQueueClient`
  factories crawlee now expects. Names are resolved to IDs via the
  collection `getOrCreate(name)` calls. apify-client's resource
  clients don't yet implement v4-only members like `getMetadata` /
  `getRecordPublicUrl`; the adapter casts through with a TODO
  comment so the structural alignment can land separately upstream.
- `Actor.init` and `_openStorage` now wrap `this.apifyClient` in
  `ApifyStorageClient` before handing it to crawlee.
- `KeyValueStore.getPublicUrl` is now async; the per-store
  `urlSigningSecretKey` is fetched on demand via the (private)
  `client.getMetadata()` instead of the removed `storageObject`
  cache. URL-signing behaviour for platform-mode reads is preserved.
- `Actor.openRequestQueue` reads `totalRequestCount` via the new
  `client.getMetadata()` (the old `client.get()` was dropped).
- `StorageManager.openStorage` is now `(class, id?, client?)` —
  removed the trailing `this.config` argument.

Stacked on #583 (config redesign); rebases onto v4 once that lands.
Replace the removed `StorageManager.clearCache()` and
`Configuration.useStorageClient()` with `serviceLocator.reset()`
plus `serviceLocator.setStorageClient()`.
…apter

- `openRequestQueue should open storage`: mock client uses
  `getMetadata()` (the v3 `get()` was dropped on
  RequestQueueClient).
- Both Storage API tests assert that StorageManager.openStorage is
  called with an ApifyStorageClient (matched structurally) instead of
  the raw ApifyClient — the SDK now wraps it for crawlee v4.
Actor.init() calls Configuration.storage.enterWith(this.config), which
sticks the resolved config onto the current async context and persists
across tests on Node 22 (but not Node 24+). The cached value short-
circuits Configuration.getGlobalConfig() so subsequent tests never see
the env vars they just set.

Reset the AsyncLocalStorage value alongside the other singletons in
the test emulator so addWebhook (and friends) see ACTOR_RUN_ID etc.
…eset

`Actor.init()` calls `Configuration.storage.enterWith(this.config)`,
which sets the AsyncLocalStorage value on whichever async context the
test runner happened to be on. `enterWith(undefined)` from a child
async branch (vitest's beforeEach) doesn't unwind that — on Node 22
the test body re-enters a sibling context where the original
`enterWith` is still in effect, so `getStore()` still returns the
stale Configuration even after our reset.

Swapping the entire `AsyncLocalStorage` instance for a fresh one
guarantees `getStore()` returns `undefined` for every async branch
that follows, fixing the addWebhook test failures on Node 22.
@B4nan B4nan force-pushed the fix/storage-client-v4-adapt branch from 4001867 to 112d8f2 Compare May 12, 2026 15:04
B4nan added a commit that referenced this pull request May 12, 2026
beta.56 (apify/crawlee#3584) renamed `StorageManager` →
`StorageInstanceManager` and reshaped the public storage open path.
The static `StorageManager.openStorage(cls, id, client)` helper is
gone; each storage class now exposes `static open(id, options?)` with
a `storageClient` option for routing through a custom backend.

- `actor.ts`: `_openStorage` now calls `storageClass.open(id, { storageClient })`
  instead of `StorageManager.openStorage(...)`. `StorageOpenOptions` replaces
  `StorageManagerOptions`.
- `key_value_store.ts`: import `StorageOpenOptions` for the `open()` override
  signature.
- `actor.test.ts`: the `openDataset` / `openRequestQueue` `forceCloud` tests
  now spy on the storage class's own `open()` (no more `StorageManager.prototype`),
  and assert the `storageClient` lives one level deeper in the options object.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@B4nan B4nan force-pushed the fix/storage-client-v4-adapt branch from 112d8f2 to 5ea0f5e Compare May 12, 2026 15:50
B4nan added a commit that referenced this pull request May 12, 2026
…terface

Squashes the full content of #595 into a single commit
so the bundle PR shows a clean four-commit summary of the v4 catch-up
stack. See PR #595 for the per-commit history.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants